Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vizier output transforms #2643

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

CompRhys
Copy link
Contributor

@CompRhys CompRhys commented Dec 3, 2024

Motivation

I wanted to experiment with the vizier output transforms in botorch - in particular the infeasible transform. Potentially misplaced effort as it appears that within the meta ecosystem Ax actually handles this sort of stuff and so these may not actually end up being able to used very much but worth upstreaming.

takes transforms from: https://arxiv.org/abs/2408.11527

Test Plan

Added tests for all functionality, not sure I am at 100% coverage

Related PRs

this branch takes off from #2636 so that would need to be merged first.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Dec 3, 2024
Copy link
Contributor

@saitcakmak saitcakmak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these. I've been curious to try out the Vizier transforms. These transforms try to ensure we model the good regions better, which seems better than trying to model everything equally well, when the goal is to optimize a function. Have you tried these on any benchmarks yet? It'd be good to see how they work e2e.

None of these transforms implement the untransform_posterior method, which is necessary to make sure the transforms work as an outcome_transform in BoTorch models. Without it, Model.posterior will error out. Would it be possible to implement untransform_posterior, likely returning a TransformedPosterior object.

None of the transforms seem to support Yvar either. We use BoTorch through Ax and try to keep the same methods regardless of whether the noise is observed. What does Vizier do with observation noise? Is it simply ignored, or is this just a limitation of current implementation?

Comment on lines 896 to 899
assert transform._batch_shape == batch_shape
assert not transform._is_trained
assert transform._shift is None
assert torch.isnan(transform.warped_bad_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to use self.assertEqual(...), self.assertTrue/False(...) etc from TestCase rather than using bare assert in tests

botorch/models/transforms/outcome.py Show resolved Hide resolved
botorch/models/transforms/outcome.py Outdated Show resolved Hide resolved
botorch/models/transforms/outcome.py Outdated Show resolved Hide resolved
Comment on lines 900 to 903
if Yvar is not None:
raise NotImplementedError(
"InfeasibleTransform does not support transforming observation noise"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to support Yvar as well. Could just set it to the mean of all Yvars? That would only minimally affect the model predictions and let the code run.

botorch/models/transforms/outcome.py Outdated Show resolved Hide resolved
y = Y_transformed[*batch_idx, :, dim]

# Get finite values and their ranks for each batch
is_finite_mask = ~torch.isnan(y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BoTorch models do not support NaN inputs. If we didn't need to deal with NaN and somehow avoided unique, could we avoid the double for loop here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the salient point, for me which indicates that I should probably have done this in Ax. Would Ax error out if a metric returned NaN before applying transforms or only after?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the error would come after, though I would have to double check. IIRC, the NaN will propagate down to BoTorch and raise an error there.

I am planning a major refactor of data handling within Ax ModelBridge layer & transforms. The goal is to reduce looping to make it more efficient & scalable. Though if you want to add these in Ax now, it doesn't have to be blocked on the refactor.

Comment on lines 1176 to 1177
for i, val in enumerate(unique_y):
ranks[y == val] = i + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we didn't care about uniqueness, we could get the rank / order using torch.argsort. Imo, the duplicate observations should be pretty rare in practice and probably are safe to ignore. They're probably mostly introduced by the infeasibility warper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could also find the median using the middle index if we simply sorted Y from the start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth having this generality incase other packages like BoFire want to make use of these methods, I tried to stick as close to the reference implementation as possible but expanding to handle multi-dimension and batch_size

botorch/models/transforms/outcome.py Show resolved Hide resolved

# Process values below median
below_median = y < self._original_label_medians[*batch_idx, dim]
if below_median.any():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a complicated process to untransform. I find it quite difficult to review since I don't know what the algorithm is supposed to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HalfRankTransform splits the data at the median and then warps the bottom half onto a gaussian whose std matches the std of the top half of the distribution. On inverting there are 3 cases, a) we can just look it up as it's very close to a known value from the fit, b) we take a in support linear interpolation, c) outside of the support we take a linear extrapolation based on the entire data range.

@CompRhys
Copy link
Contributor Author

CompRhys commented Dec 3, 2024

Thanks for adding these. I've been curious to try out the Vizier transforms. These transforms try to ensure we model the good regions better, which seems better than trying to model everything equally well, when the goal is to optimize a function. Have you tried these on any benchmarks yet? It'd be good to see how they work e2e.

I implemented these here before looking into what I would need to do to connect them to my benchmark testcase that I have been prototyping in Ax. I assumed that the best case would be to implement it lower in the stack and then bubble up but seemingly I should have implemented at the level of Ax (especially as the harder part here was handling the batch_size) hence the comment about potentially misplaced effort. It may be that these shouldn't be merged here if the details can't be hashed out.

None of these transforms implement the untransform_posterior method, which is necessary to make sure the transforms work as an outcome_transform in BoTorch models. Without it, Model.posterior will error out. Would it be possible to implement untransform_posterior, likely returning a TransformedPosterior object.

This should be able to be done based on the logic in Standardize and just using the sample transform. In general the bit here that makes things annoying is the batch_size.

None of the transforms seem to support Yvar either. We use BoTorch through Ax and try to keep the same methods regardless of whether the noise is observed. What does Vizier do with observation noise? Is it simply ignored, or is this just a limitation of current implementation?

I think the vizier approach is similar to performing the transformations at the layer of Ax. Their equivalent to Metric only deals with a single float observation. As such and from the docs I believe it's ignored. I don't think that we can implement Yvar for either the LogWarping or the HalfRank transform

@CompRhys CompRhys marked this pull request as draft December 3, 2024 20:15
@saitcakmak
Copy link
Contributor

Ax vs BoTorch

This is something I've also been thinking about quite a bit lately, as part of reworking what Ax uses by default (though this has been rather focused on input transforms so far). Outcome transforms may be slightly simpler in Ax. You technically only need to be able to transform, though we also want some rough untransform method for reporting & cross validation purposes. In BoTorch, you need the transform to be differentiable, since the untransformed posterior is used to compute the acquisition value which needs to be differentiated through during optimization. Ax transforms are applied once as a pre-processing step but BoTorch transforms are applied at each posterior call. There is also the aspect of learnable / tunable transforms in BoTorch but that doesn't apply to outcome transforms.

If I were to implement Vizier transforms, I'd probably try them in Ax, since there are fewer restrictions.

In general the bit here that makes things annoying is the batch_size.

Do you often use batched models? If not, you don't need to support batch_size arg and different treatment for each batch. You could use the same parameters for all batches.

I think the vizier approach is similar to performing the transformations at the layer of Ax.

Agreed. It is more of a pre-processing step.

Their equivalent to Metric only deals with a single float observation. As such and from the docs I believe it's ignored. I don't think that we can implement Yvar for either the LogWarping or the HalfRank transform

Cool, this is good to know.

@CompRhys
Copy link
Contributor Author

CompRhys commented Dec 5, 2024

Will move forward with a PR at Ax as that seems like the faster way to make progress on my usecase

@saitcakmak
Copy link
Contributor

Sounds good. StandardizeY or PowerTransformY should be good examples to follow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants